Skip to content

[TEMP]feat: add torch profiler for the function update_weights#10

Merged
JensenFire merged 5 commits intoJD-ETH:jd/rdma-integrationfrom
JensenFire:jsf/pr3-torch_profiler
Jan 7, 2026
Merged

[TEMP]feat: add torch profiler for the function update_weights#10
JensenFire merged 5 commits intoJD-ETH:jd/rdma-integrationfrom
JensenFire:jsf/pr3-torch_profiler

Conversation

@JensenFire
Copy link
Collaborator

Add torch profiler for the update_weights part, with the tag --use-pytorch-profiler-update-weight. Users could find the file update_weights_call{step}_rank_{gpu_id}.pt.trace.json.gz. Open it with https://ui.perfetto.dev/ and the result is sth like:

image

@JensenFire JensenFire requested review from JD-ETH and Risc-lt January 5, 2026 09:04
Copy link
Owner

@JD-ETH JD-ETH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vibe coding is convenient, and often even elegant, lol. But it tends to be very verbose, redundant, with fail safes --- let's try to be concise and careful here, and prioritize human readibility first; it's better to throw errors and crush runs immediately for the profiling.

parser.add_argument(
"--profile-update-weight-start",
type=int,
default=0,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just have the start default to 1, and log everything afterwards? We only do 3 training steps anyways and this profiler will only be used for our profiling configs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean we could skip the first step?

)
parser.add_argument("--check-weight-update-equal", action="store_true")
parser.add_argument(
"--use-pytorch-profiler-update-weight",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this impact performance of the run? If not, let's just set it default on

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're not targeting at merging this feature into the official slime repo, i think it's ok. And i'll also tag this pr with [TEMP] and we'll revert this PR in the end.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aight

default=0,
help="After enabling PyTorch profiler for weight update operations, start profiling from this point. Requires --tensorboard-dir to be set.",
)
parser.add_argument(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I'd prefer we remove this

Copy link
Collaborator Author

@JensenFire JensenFire Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or keep it, just like the profiler of slime itself?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh they have this also?

Copy link
Collaborator Author

@JensenFire JensenFire Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the interesting part. Slime and Megatron-LM share the same argument use-pytorch-profiler (/root/Megatron-LM/megatron/training/arguments.py). We could also find the usage in TrainProfiler of /root/slime/slime/utils/profile_utils.py:

class TrainProfiler:
    def __init__(self, args):
        self.args = args
        self._torch_profiler_overall = None
        self._memory_profiler_overall = None

        if args.use_pytorch_profiler and ("train_overall" in args.profile_target):
            self._torch_profiler_overall = _create_torch_profiler(args, name="train_overall")

        if args.record_memory_history and ("train_overall" in args.profile_target):
            self._memory_profiler_overall = _BaseMemoryProfiler.create(args)
            self._memory_profiler_overall.start()

Basically, when --use-pytorch-profiler enabled, it will record all python functions between steps, and it's quite large (>100MB) if we want to get the mapping between python functions and the cpu/gpu occupying. Besides that, there're too many redundant parts in this profiler, since all we care about is updating weights. That's why i create another function profiler

@JensenFire
Copy link
Collaborator Author

Updated @JD-ETH

default=0,
help="After enabling PyTorch profiler for weight update operations, start profiling from this point. Requires --tensorboard-dir to be set.",
)
parser.add_argument(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh they have this also?

)
parser.add_argument("--check-weight-update-equal", action="store_true")
parser.add_argument(
"--use-pytorch-profiler-update-weight",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aight

@JensenFire JensenFire changed the title feat: add torch profiler for the function update_weights [TEMP]feat: add torch profiler for the function update_weights Jan 7, 2026
@JensenFire
Copy link
Collaborator Author

Based on the comments, mark this PR as [TEMP], and it will be reverted in the future.

@JensenFire JensenFire merged commit a6715ba into JD-ETH:jd/rdma-integration Jan 7, 2026
1 check passed
Risc-lt pushed a commit that referenced this pull request Jan 28, 2026
Add torch profiler for the update_weights part, with the tag `--use-pytorch-profiler-update-weight`. Users could find the file `update_weights_call{step}_rank_{gpu_id}.pt.trace.json.gz`. Open it with `https://ui.perfetto.dev/`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants